Automatic generation of sets of keywords for theme characterization and detection

نویسندگان

  • Mathias Rossignol
  • Pascale Sébillot
چکیده

The paper describes a system that automatically detects themes in a textual corpus and characterizes them by sets of keywords, that is, words whose co-occurrence in a paragraph indicates that this paragraph tackles a certain theme. (Pichon and Sébillot, 2000) presents a first version of it where those sets are obtained with the help of the CHAVL hierarchical clustering algorithm, grouping words that have a similar repartition over paragraphs. The weaknesses of the system (quality of the classes highly dependent on manual parameter settings, relevant classes in the classification tree hardly pointed out automatically) are largely reduced here by using a combined classification of the paragraphs based on their lexical cohesion. This new classification first allows to densify the processed data, thus helping CHAVL produce more satisfying classes; it also gives a means to establish an original statistical quality measure that can be exploited both to point out the relevant classes in the tree and to reorganize some of the mergings proposed by CHAVL.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AN-EUL method for automatic interpretation of potential field data in unexploded ordnances (UXO) detection

We have applied an automatic interpretation method of potential data called AN-EUL in unexploded ordnance (UXO) prospective which is indeed a combination of the analytic signal and the Euler deconvolution approaches. The method can be applied for both magnetic and gravity data as well for gradient surveys based upon the concept of the structural index (SI) of a potential anomaly which is relate...

متن کامل

تولید خودکار الگوهای نفوذ جدید با استفاده از طبقه‌بندهای تک کلاسی و روش‌های یادگیری استقرایی

In this paper, we propose an approach for automatic generation of novel intrusion signatures. This approach can be used in the signature-based Network Intrusion Detection Systems (NIDSs) and for the automation of the process of intrusion detection in these systems. In the proposed approach, first, by using several one-class classifiers, the profile of the normal network traffic is established. ...

متن کامل

Automatic Pavement Crack Detection Based on Aerial Imagery

Road health information is an important indicator for assessing the status of the road in management systems. Identifying the abandonment of surfaces is an important process in maintaining roads and traffic safety, which is traditionally conducted on the basis of field surveys. Today, remote sensing methods, especially photogrammetric imaging, are presented. In this article, based on by UAVs im...

متن کامل

A New Approach towards Precise Planar Feature Characterization Using Image Analysis of FMI Image: Case Study of Gachsaran Oil Field Well No. 245, South West of Iran

Formation micro imager (FMI) can directly reflect changes of wall stratums and rock structures. Conventionally, FMI images mainly are analyzed with manual processing, which is extremely inefficient and incurs a heavy workload for experts. Iranian reservoirs are mainly carbonate reservoirs, in which the fractures have an important effect on permeability and petroleum production. In this paper, a...

متن کامل

P81: Detection of Epileptic Seizures Using EEG Signal Processing

Epilepsy is the most common brain diseases that cause many problems in the daily life of the patient. In most attempts to automatic detection, the attack used an EEG. In this paper, The complete data set consists of five sets recorded from normal and epileptic patients. Each set containing 100 single-channel EEG segments. Here we used first and last sets (A and E). Set A consisted of segments r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002